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L-i We study the problem of a buyer (aka auctioneer) who gains 
^vfjtochastic rewards by procuring multiple units of a service or 
Jitem from a pool of heterogeneous strategic agents. The reward 
■^Obtained for a single unit from an allocated agent depends on 
r^6he inherent quality of the agent; the agent’s quality is fixed 
but unknown. Each agent can only supply a limited number of 
jpits (capacity of the agent). The costs incurred per unit and 
capacities are private information of the agents. The auctioneer 
(i—is required to elicit costs as well as capacities (making the mech- 
iy-g,nism design bidimensional) and further, learn the qualities of 
^^he agents as well, with a view to maximize her utility. Moti¬ 
vated by this, we design a bidimensional multi-armed bandit 
• r^srocurement auction that seeks to maximize the expected util- 
^]ty of the auctioneer subject to incentive compatibility and in¬ 
dividual rationality while simultaneously learning the unknown 
qualities of the agents. We first assume that the qualities are 
known and propose an optimal, truthful mechanism 2D-OPT 
for the auctioneer to elicit costs and capacities. Next, in order 
to learn the qualities of the agents in addition, we provide suffi¬ 
cient conditions for a learning algorithm to be Bayesian incen¬ 
tive compatible and individually rational. We finally design a 
novel learning mechanism, 2D-UCB that is stochastic Bayesian 
incentive compatible and individually rational. 


1 Introduction 

Auction based mechanisms are widely used to allocate goods 
or services in the presence of strategic agents. In different con¬ 
texts, the auctioneer may have different goals such as welfare 
maximization or utility maximization or revenue maximization 
or cost minimization. Auction theory generally assumes that 
the players are symmetric which means they are distinguished 
only by privately held types such as costs, valuations, or ca¬ 
pacities. The theory does not consider the “experience” of an 
auctioneer resulting from the consumption of the commodity 
or service. The experience can be uncertain and not known up¬ 
front. For example, consider a hospital (auctioneer) interested 
in procuring a large number of units of a single generic drug 
from various pharmaceuticals who can supply limited quanti¬ 
ties at different production costs. The quality of the procured 
generic drug from a supplier can depend on several parameters 
such as methodology used in preparation and other parameters 
which are inherent to the supplier. In this example and sev¬ 
eral other real world scenarios, there is an inherent heterogene¬ 
ity amongst services or items procured from different agents. 
Therefore, we can attribute to every agent an inherent quality 
which is a measure of the perceived experience or reward. Thus, 
in order to maximize her utility, the auctioneer needs to mini¬ 
mize her payments at the same time ensure a required quality 
of service. If the qualities from different agents are observed re¬ 
peatedly, the auctioneer can learn the quality of the agents for 
future optimization. 
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A strong motivation for this work comes from the setting of 
crowdsourcing. The quality of human generated data or labels 
is an important input for an AI process or a machine learning 
system. With the advent of several crowdsourcing marketplaces, 
such inputs are now obtained at much less cost from a global 
pool of heterogeneous crowd workers. These human workers 
have different quality levels and can be strategic about their 
costs. The risk of low quality levels is mitigated via learning 
algorithms which can predict high quality workers while strate¬ 
gic behavior of crowd workers can be addressed via mechanism 
design. Thus, the auctioneer here is a requester who seeks to 
procure tasks from strategic crowd workers with privately held 
costs, privately held capacities, and unknown qualities. 

Motivated by situations such as above, we consider a procure¬ 
ment scenario where a buyer (or auctioneer) wishes to procure 
multiple units of a service or item from a pool of heteroge¬ 
neous agents with unknown qualities, privately held costs, and 
privately held limited capacities. Our goal is to design a pro¬ 
curement auction that learns the qualities of the agents, elicits 
true costs and capacities from the agents, and maximizes the 
expected utility of the auctioneer. If the agents are honest in 
reporting their costs and capacities, the classical Multi-Armed- 
Bandit (MAB) techniques can be used to learn the qualities. 
For example, Tran-Thanh et. al. [29] have proposed a greedy ap¬ 
proach to learn the qualities of the crowd workers. On the other 
hand, if all the agents have the same quality that is common 
knowledge but with strategic costs and capacities, the auction¬ 
eer can deploy the techniques available in the literature [11, 16] 
to elicit true costs and capacities. In the setting considered in 
this paper, in addition to strategic costs and capacities, we also 
address heterogeneity amongst agents and moreover we learn 
their qualities. 

Learning in the presence of strategic agents in a multi armed 
bandit (MAB) setting leads to MAB mechanisms [4]. In this 
paper, we take a detour from current MAB mechanism theory 
in two ways, (i) We propose an optimal MAB mechanism that 
performs nearly as well as an optimal auction with full infor¬ 
mation, whereas the current literature mainly focuses on social 
welfare maximization (ii) We provide a characterization for a 
weaker notion of truthfulness i.e. stochastic Bayesian incentive 
compatibility that can potentially achieve better regret bounds. 
More importantly, while the existing research is also limited to 
learning with agents having single dimensional private informa¬ 
tion, we design an MAB mechanism when the agents’ private 
information is two dimensional. In particular, following are the 
contributions of this paper: 


• We first explore the case of heterogeneous agents with known 
qualities and provide a characterization for any Bayesian 
Incentive Compatible (BIC) and Individual Rational (IR) 
mechanism in a bidimensional setting. Using this charac¬ 
terization, we provide the footprint for a mechanism to be 
BIC, IR and maximizes the expected utility of the auctioneer 
(Theorem 2). We then propose an optimal mechanism 2D- 
OPT which is in fact dominant strategic incentive compatible 
(DSIC) and IR (Theorem 3). 

• We next take up the case when the qualities are unknown and 
derive sufficient conditions for an allocation rule to be imple¬ 
mented in stochastic BIC and IR (Theorem 6). 1 This leads 
to a learning mechanism 2D-UCB that is stochastic BIC and 
IR (Theorem 9). We evaluate 2D-UCB through simulations 
and show that the expected utility of an auctioneer adopt¬ 
ing 2D-UCB mechanism approaches that of the omniscient 
2D-OPT. 

2 Positioning of our Work 

An extensive study of auction theory and mechanism design can 
be found in [18]. The notion of optimal auction was introduced 
by Myerson [22]. Subsequently, there were many significant re¬ 
sults in single parameter domains, however, the multiple param¬ 
eter domain was unexplored until recently. The readers are re¬ 
ferred to [12, 21] for more details on optimal multi-dimensional 
mechanism design. The settings addressed in most of the lit¬ 
erature assume additive valuation. In our work, cost and ca¬ 
pacity parameters constitute the private information and the 
valuation of the agents is not additive in these two parameters. 
Notably, Iyengar and Kumar [16] have designed optimal single 
item multi unit auction for capacitated bidders and this is fur¬ 
ther developed by Gujar and Narahari [11] for multi-item multi 
unit auctions. However, as pointed out in Section 1, the above 
works [11, 16] assume that all agents are of the same quality. 
In our setting, the agents are heterogeneous and their qualities 
need to be learnt. 

If we assume honest agents, the multi-armed-bandit theory [3, 
19] is applicable to learn the qualities of the agents. Upper 
confidence bound based algorithms have been designed to learn 
unknown quantities with logarithmic regrets [8]. In the specific 
context of crowdsourcing, much research has been carried out 
for learning qualities of the crowd workers [1, 2, 7, 13, 14, 15, 26, 

1 Note that, this is sufficient condition and the complete characterization 
is still open. 
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27, 28, 30]. In a pure learning setting devoid of strategic play, 
the closest setting to ours is the one in Tran-Thanh et al. [29] 
which studies the problem in the context of crowdsourcing to 
maximize the number of successful tasks under a fixed budget. 
Note that all the above papers assume costs are known. 

A learning algorithm can be potentially manipulated by a 
strategic agent so as to increase utility. This problem is ad¬ 
dressed using MAB mechanism design theory [4, 5, 6, 9, 10, 17, 
20, 25]. Most of the literature in this space (except [5]) consid¬ 
ers strategic agents with single dimensional private information 
and seeks to maximize social welfare. Our work, on the other 
hand, seeks to maximize the expected utility of the auctioneer. 
The work in [5] considers a multi-parameter setting and seeks to 
maximize welfare, but with an additive valuation model where 
the valuation of each agent is a linear combination of different 
private values. Our work is different from [5] as we aim to de¬ 
sign an optimal auction in a capacitated setting where additive 
valuations do not apply. 

3 Notation and Preliminaries 

An auctioneer wishes to procure L units of an item from an 
agent pool iV ={1,2,..., n}. Let q t £ [0,1] represent the quality 
of agent i, let Cj £ [c i; Cj] be his true cost and let ki £ [fc^fcj] 
represent the maximum number of units an agent can provide 
or his true capacity. Let, q , c, k denote the vectors of qualities, 
costs and capacities respectively. We consider a linear reward 
function for the auctioneer and she obtains an expected reward 
of Rqi on procuring an unit from agent i where R is a fixed 
positive real number. 

In this work, we make an important and reasonable assump¬ 
tion that the agent is not allowed to over-report his capacity. 
This is because if the auctioneer allocates the agent beyond his 
capacity, it is detected eventually when the agent fails to de¬ 
liver. This could lead to imposition of a high penalty or may 
lead to blacklisting the agent from further participation. In con¬ 
trast to over-reporting, under-reporting of capacity cannot be 
detected. In the absence of proper incentives, an agent can cre¬ 
ate virtual scarcity of agents by under-reporting his capacity 
which can benefit him. 

We denote the reported cost by c* £ [Cj, c.;] and the reported 
capacity by ki £ [k t , ki]. Let bi = (cj, ki) denote the bid of agent 
i and the bid vector of all the agents except i is denoted by 6_j. 
The objective of the auctioneer is to maximize the expected 
reward from L units of the item and at the same time also 


minimize the payments to the agents, ensuring that from each 
agent i at most ki units are procured. If all the parameters are 
known, then one can solve the following optimization problem 
which maximizes the utility of the auctioneer: 

max ( X'i.Rcn ~ U 1 s.t. sc* £ {0,1, ..., k,} , ^ Xi < L, (1) 

i =1 ' ' i 

where, Xi represents the number of units that are procured 
from an agent i and U denotes the payment given to an 
agent i. The total number of units procured from the agents 
x = (xi,X 2 , ■ ■ ■ ,x n ) (allocation) and the payments made to 
the agents t = (ti,t 2 , ■ ■ ■ ,t n ) form the mechanism denoted by 
M = (x,t). Note that the allocation x and payment t depend 
on the bids reported by the agents and the qualities. We as¬ 
sume an independent private value model, and that the joint 
probability density function denoted by fi{ci,ki) is common 
knowledge. Let X and T denote the expected allocations and 
expected payments when expectation is taken over bids of other 
agents. That is, Xk ,; q ,) represents the expected number of 
units procured from agent i when he bids cost per item Cj, bids 
capacity ki and the quality is gy Similarly Tfs are defined. We 
now define some desirable properties for a mechanism if quali¬ 
ties were known. 

Definition 1 (Bayesian Incentive Compatible) A mecha¬ 
nism is called Bayesian Incentive Compatible (BIC) if report¬ 
ing truthfully gives an agent highest expected utility when the 
other agents are truthful, with the expectation taken over type 
profiles of other agents. Formally, V* £ IV,Vci,Cj £ [Cj,c»],VA^ £ 
[ki,ki], 

Uilpi , ki , Ci , ki , q) ^ Ui)Ci , ki, Ci, ki, q) , 

where, U z (ci,ki,c z ,ki;q) = E b _ i [ciX i (ci,k i ;q)+ ti(ci,ki;q)] 


Definition 2 (Dominant Strategy Incentive Compati¬ 
ble) A mechanism is called Dominant Strategy Incentive Com¬ 
patible (DSIC) if reporting truthfully gives every agent highest 
utility irrespective of the bids of the other agents. Formally, 
\/i £ IV,Vcp Ci £ [ci,Ci\yki £ [ki,ki\, Vc_ i; 

Ui (Cj, C— i, ki, k — i, C, k, q) ^ Zii(Cj, C— i, ki, k — i, c, k, g) 

where iq(cj, c_j, fcj, fc_j, c, fc; g) = CiXi(c,k;q) +ti(c,k;q) is the 
utility when the true bid profile is c, k and agent i reports Ci,ki. 
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Definition 3 (Individually Rational) A mechanism is 
called Individually Rational (IR) if no agent derives nega¬ 
tive utility by participating in the mechanism. Formally, Vi £ 
N, Vq e [Ci,5j], Vfcj e [k z ,ki\, 

Ui(d,ki,c,k;q) > 0 

Definition 4 (Optimal Mechanism) A mechanism M = 
( x,t) is called optimal if it maximizes eg. (1) subject to BIC 
and IR. 


4 Auction with Known Qualities 

We now derive the characterization for any mechanism to be 
BIC and IR when the qualities are known. 

4.1 Characterization 

In the setting considered in the paper, as described in section 3, 
VCG mechanisms can be used to elicit the costs and capacities 
from the agents and it satisfies DSIC, IR. However, VCG mech¬ 
anisms maximize social welfare and may or may not be utility 
maximizing for the auctioneer [23]. 

Any allocation should be compensated with at least the cost 
incurred by the agent, irrespective of the quality of the unit 
procured. We propose to pay a premium to each agent above his 
true cost so as to incentivize him to report costs and capacities 
truthfully. We define V* £ N, 

piibi ; q) = Ti(bi ; q ) - aXi(bp q), where b t = ( a,ki ). 

The utility of an agent i with bid bi is given as, 

Ui(bi,a,ki;q) = Ti(bpq) - aXi(bpq) 

= pi{bi ; q ) - (ci - Ci)Xi{bi- q) (2) 

Thus pi represents the offered utility when all the agents are 
truthful. With the above offered incentive, we have the following 
theorem. 

Theorem 1 A mechanism is BIC and IR iff'ii £ N, 

1. Xi[ci,ki\q) is non-increasing in a, \/q and\/ki £ [ki,ki]. 

2. Pi{ci, hp, q) is non-negative, and non-decreasing in kiW q and 

V ^ £ [CjjCf] 

3. pi{ci, kp, q) = Pi(ci, kp, q) + J~; Xi{z, kp, q)dz 


We refer to the above three statements as conditions 1, 2 and 
3 respectively. 


Proof: To prove the necessity part, we first observe due to BIC, 

Ui(6i, ki, Ci,kpq) < Ui{ci,ki,Ci,ki;q) \/{a,ki) and ( a,ki ) 

'' UiiCi , ki, Ci , ki , qf ^ UiiCi, ki, Ci, ki , (f) 

We assume Cj > c,. The proof follows in identical lines other¬ 
wise. From eq. (2), 

U-iipi, ki, Ci, ki, qj — Uiici, ki, Ci, ki, q'i -(- (Ci cfjXiici, ki, q'), 

which implies that, 

Ui(£i, ki, Ci, ki’, q) — Ui(ci, ki, Ci, ki', q) t , 

-x- < — Xi(a, kpq). 

Ci — Ci 

Similarly using Ui (c i5 ki, Ci, kp, q) < Ui{ci, h, Ci, kp, q), 

Uii^Ci, ki, Ci, ki, qf Ui^Ci, ki, Ci, ki, qf 


Xii^Ci, ki, q ) ^ 

V XifCi, ki, qf 
Taking limit Cj —> Ci, we get, 

dUi(a, h, a, kp q) 


da 


= -Xi(a, kp q). 


(3) 


(4) 


Equation (3) implies, Xi{ci,kp,q) is non-increasing in Cj. This 
proves condition 1 of the theorem in the forward direction. 
When the worker bids truthfully, from Equation (2), 

pi(ci, kp q) = Ui(a, ki, a, ki', q). (5) 

For BIC, Equation (4) should be true. So, 

Pi(ci,kpq) = Pi(ci,kpq) + J X.fz, kp q)dz (6) 

This proves condition 3 of the theorem. BIC also requires, 
ki G arg max Ui(a, ki, a, kp q) V a £ [c i; a\ 

ki€[k_i,ki] 

This implies, Vc^, pi(ci,kp,q ) should be non-decreasing in ki. 
The IR conditions (Equation(5)) imply 

Piicpkpq ) > 0. 

This proves condition 2 of the theorem. Thus, these three con¬ 
ditions are necessary for BIC and IR properties. We now prove 
the sufficiency. Consider 

Ui(ci,ki,a,kpq) = pi{a,kpq) > 0. 
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So the IR property is satisfied. We assume c* > c,. The proof 
is similar for the case Ci < Ci. To establish BIC, consider: 


Ui(ci, fo, a, ki\q) 


= Pi(£i, 

hi’, q) 

( Ci - 

- a)Xi(ci, ki ; q) 

(By Defn) 

= PiiCi, 

ki’,q) 

+ jC 

Xi(. 

z, kr, q)dz + (£i - 

Ci)Xi(di, ki) 







(By hypothesis) 

— pi{ci, 

ki', q) 


Xi(. 

z, kr, q)dz 




;/ 

&i 

Xi(; 

z,ki', 

q)dz + (Ci - a)Xi(ci, ki' q) 


< Pi{Ci, 

ki’,q) 



(W 

is non-increasing in 

Ci) 

< Pi(Ci, 

kr, q) 



( as pi 

is non-decreasing in 

ki) 

= Ui(a 

, ki ? Ci 

,ki',q) 



■ 



4.2 Sufficiency Conditions for Optimality 

We now present sufficiency conditions for an IR, BIC mecha¬ 
nism to be optimal. Let and /j(cj|fcj) denote respec¬ 

tively the cumulative distribution and probability density func¬ 
tion of cost of an agent i given the capacity. 


Theorem 2 Suppose the allocation rule maximizes 


i=1 Jc ± J C n Jk x Jk n \ \ 


Fi(,Ci\ki 


fi(ci\ki) 

Xi(a, ki,c-i, k-i)f i(ci, fci)... fn(c„, kn) dci... dc n dki... dk n (7) 


subject to conditions 1 and 2 of Theorem 1. Also suppose that 
the payment is given by 


Ti(a,ki-,q) = CiXi(a,ki-,q) + / X t (z, h', q)dz (8) 


Proof: The auctioneer’s objective is to maximize her expected 
utility which is: 

n pc.\ fCn pk\ pk n 

^2 ■■■ / ■■■ [RqiXi(b- q) - ti(b; qj\ 

fi(ci, fci) ... /„(c„, k n )dci ... dc„ dki ... dk n 

n /* cp r c n pk\ pk n 

= ^2 •■■/ / •••/ [xi{b', q){Rqi - Ci + a) - ti(b-q)\ 

fi(ci,ki)... fn(c„, k n )dc\ ...dcn dki ... dk n 

n pc.\ p c. n pk\ pk n 

= J2 •••/ / •■•/ (dXi(b-,q) -ti{b\qj) 

fl(ci, fci) ... fn{c n , kn)dci . .. dc„ dki ■ ■ ■ dkn 

n pci PCn pk\ pk n / \ 

+ ^ ^ / • • • / / • • • / I RQi Ci J Xi(C-i , Ali, C— i) k—i') 

i=\ ^ — 1 ^ —n ^ —1 ^ —n V / 

/l(ci, fci) . . . fn(Cn, k n ) dCl . . . dc n dfcl . . . dkn (9) 


The second term of eq. (9) is already similar to the desired form 
of the objective function of auctioneer given in eq. (7). We now 
use conditions 1 and 3 of Theorem 1 to arrive at the result. 
Consider the first term, 

pc i r c n pk i pkn 

/ ■■•/ / ■■■/ (ciXi{b\ q) — ti(b; q)) 

J c-y J c n J k^ J k n 

/i(ci, fci)... fn(c n , k n )dci ... dc n dki... dkn 

/ ki pci 

J Pi(ci,ki;q)fi(ci,qi)dcidki (Integrating out 6_i) 


/ ki J c, 
pki pc. 


= -J J (pi(ci,ki) + J X{z,k i -,q)dz S j fi{ci,ki)dadki 

(As we need truthfulness) 

pki PCi 

= — / Pi(ci,ki)fi(ci,ki)dcidki 

J ki J ^ 

pki cci pz 

- / Xi{z,ki-,q)dz / fi(a\ki)dci fi(ki)dki 

J k^i J c_i J Ci 

(Changing order of integration) 

pki PCi 

= -J J Pi{ci,ki)fi(a,ki)dcn 

pki pc 

J k- J c, 


i dki 


Xi(z, h ; q)Fi(z\ki)dzfi(ki)dki 


k-i J C, 
pki pc. 


pki pc 

J k- J c ■ 


pi{ci, ki)fi{ci, ki)dci dki 


' fc„- J Ci 

pkj pc. 


[ [ Xi(ci,ki;q)^Y^Trlfi{ci,ki)dCidki 

Jk , Jc, fi{d\ki) 


then such a payment scheme and allocation scheme constitute 
an optimal auction satisfying BIC and IR. 
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The last step is obtained by relabeling the variable of integra¬ 
tion and simplifying. 

Here, p,;(c,;, fc,) denotes the utility of an agent i when his 
true type is (cj,fci). With this type profile, the auctioneer 
by paying Ci can ensure both IR and IC, hence we can set 
Pi(ci,ki) = OjVAij € [fcjjfci]. Applying this in the above equa¬ 
tion, we get that the objective function of the auctioneer is 
similar in form to eq. (7). Consider Condition 3 of Theorem 1, 
and set Pi(ci,ki) = 0, we get eq. (8). By construction, the 
mechanism is BIC and IR. And, since the auctioneer’s expected 
utility is maximized the mechanism is optimal. I 


Analogous to the literature on optimal auction [11, 16, 22], 
we assume regularity on our type distribution as follows. 


Definition 5 (Regularity) We define the virtual cost func¬ 
tion V* £ N as 


Hi(ci, kf) := d + 


Fi(a\ki) 

fi ( C'j | ki ) 


We say that a type distribution is regular if Vi, Hi is non¬ 
decreasing in Ci and non-increasing in ki. 


This assumption is not restrictive in single dimension setting 
as standard techniques of ironing are available [22]. The ironing 
techniques can also be applied in bidimensional setting when¬ 
ever the marginal cost distribution is independent of marginal 
capacity distribution. 


4.3 2D-OPT: An Optimal Auction 

We now present our mechanism 2D-OPT give in Algorithm 1. 

Theorem 3 Mechanism 2D-OPT is optimal, DSIC and IR. 

Proof: We will prove that 2D-OPT satisfies Theorem 2, which 
proves optimality, IR, and BIC. The allocation function (AL¬ 
LOC) allocates maximum possible units to agents in decreasing 
order of G’s, which in turn maximizes eq. (7). This is because 
eq. (7) is a linear combination of G’s. The monotonicity con¬ 
straint 1 of Theorem 1 is satisfied due to regularity. 

Fix an agent i with non-zero allocation. We will show that the 
payment given to the agent i given by 2D-OPT is the same as 
in eq. (8). We fix a bid profile 6_j, that yields non-zero allocation 
to agent i. The payment to agent i for bid profile (bi,b-i) as 
per eq. (8) is as follows. 

ti(a,ki,b-i\q) = CiXi(ci,ki,b-i;q) + / Xi(z,ki,b-i;q)dz (11) 


ALGORITHM 1: 2D-OPT Mechanism _ 

Input: Vi, Bids bi = (&i ki), reward parameter R 
Output: An optimal, DSIC, IR Mechanism M = (x,t) 

1 Allocation is given by x = ALLOC(A, c, k, q, L ) 

2 for i £ N && i; ^ 0 do 

3 Gi := Rqi — Hi(bi) 

4 y = ALLOC(A \ {i}, c_ i; (fc_i - x-i), q~i,Xi) 

5 Payment to i, ti = 

T. Vk ma x(G~ 1 (Rq k - H k (b k )), cf) + (x t - ^ y k )ci 

keN\{i} k 

6 end 


l Subroutine: ALLOC (iV T , c T , k r ,q T ,L r ) 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


Input: (iV T , c T , /c r , g r , L T ) where 
N T =: Set of agents, 
c T =: Bid vector of costs, 
k T =\ Bid vector of capacities, 
q T =: Vector of qualities, 

L T =: Total number of units being allocated. 

Output: Vector x of units allocated to each agent, 
for k, G N T do 

H K {c T K ,kl)mc T K + ^^:} 

G k := Rq T K -H4d,k£) 

end 

( 01 , 02 , • ■ •) = Sorted indices of agents in N T in non-increasing 
order of G K 
x = 0 
LW = L T 

for 1 < 7} < |A T | && Gar, > 0 do 

x ar , = nmx(kl ri ,L M ) 
lA+l> = L M - Xar, 

end 


If expectation is taken on for eq. (11), we get eq. (8). The 
interchange of integral and expectation required therein is valid 
due to Fubini’s Theorem [24] as the integrand is finite and non¬ 
negative. We will show that 2D-OPT computes this payment 
for any 6_j. 

To compute RHS of eq. (11), we first observe that when bid¬ 
der i alone increases his bid, he can lose some (or all) of the 
units allocated to him to bidders with lower values of G. Hence, 
the allocation to agent i as a function of his bid z £ [cj,c,] is 
a step function as shown in Figure 1. And, the payment to be 
given to agent i as per eq. (11) is the shaded area. 

Let g < g^ < . < g^ rn ' > where g ^ > c,, g( m ' > < Ci, 

be the costs at which agent i loses some more of his units. At 
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Xi(z, ki , b_i, q ) 



Figure 1: Allocation to agent i as function of his bid z 

these points, the allocation also dictates that an allocated agent 
r either completely exhausts the units Xi allocated previously 
to i or he himself has no more capacity left. 

On the other hand, the payment scheme of 2D-OPT first de¬ 
termines the allocation of Xi(d, ki, C-i, k-i) units in the absence 
of i as given by algorithm 1 of algorithm 1. 

Let U =: {j £ N \ {*} : y.j ^ 0} where y is the allocation 
to the worker set N \ {i}. We will partition the set U into 
V =: {j £ N \ {*} : yj ± 0, Gj(cj) < Gj < Gi(d)} and 
W =: {j £ N \ {i} : yj ^ 0, 0 < Gj < Gj(cj)}. With out 
loss of generality, we will assume Gi{d) > 0, otherwise we will 
relabel G' 1 (0) as d- No allocations are made to agents with 
negative value of G(see line 9 of ALLOC). Also, as allocation 
of Xi units consider residual capacity (k_i — X-f) (see line 4 of 
algorithm 1), no agent with G higher than Gj(cj) will have any 
capacity left. 

For the sake of simpler exposition, we will assume U = VUW, 
the proof follows similar lines otherwise. Let (ai,a 2 , ...,a m ) as 
the indices of agents in V sorted in non-increasing order of G. 
Now, agents are allocated units from Xi in the order given by 
( a k)k=i- Now, it follows that G~ 1 {Rq ai - H ai (b ai )) = g (1) and 
the allocation to this agent ui corresponds to j/ 1 ). This forms 
the term y ai G~ 1 ((Rq ai — H ai (b ai )) of the payment to i and 
corresponds to the area of rectangle ABCD. Similarly, the pay¬ 
ment to i due to 02 corresponds area of rectangle DEFG. This 
holds for all agents in the set V and rectangle PQRS denotes 
the payment due to a m . Finally, rectangle STUV corresponds 
to agents in W or units that are unallocated as there is no ca¬ 
pacity left in the remaining agents. The latter is captured by 
the term (xj — '^2 k yk)d- Hence proposed payment computes 


eq. (8) as we have shown it for any fixed &_j. 

The offered utility pi when all agents are truthful is non¬ 
decreasing in the true capacity ki. This is due to the greedy 
nature of the allocation in ALLOC. Thus, condition 2 of The¬ 
orem 1 is satisfied. 

Thus, 2D-OPT satisfies the Theorem 2. We therefore have 
that the proposed mechanism is BIC, IR, and optimal. 

In respect of proving DSIC, we omit a formal proof due to 
space constraint and provide only a sketch. We note that the 
allocation is deterministic and the payment to agent i does not 
depend on his bid directly and only depends via the allocation. 
Furthermore, the payments are computed based on the alloca¬ 
tions that are made in the absence of i for the Xi units he has 
been allocated currently. For every unit, the agent is paid the 
best possible price he could have bid and still won the unit. ■ 

5 Auction with Unknown Qualities 

This section addresses the problem when qualities are not 
known and are to be learnt. In order to maximize her utility, the 
auctioneer will procure units from agents in a sequential manner 
so that she can make future decisions based on the past learn¬ 
ing history. We now discuss definitions relevant in this setting. 

Definition 6 (Reward Realization) A reward realization s 
is an n x L table where the ( i,j ) entry represents an indepen¬ 
dent realization drawn from the true quality of i th agent when 
procuring the j th unit from him. 

Note that (■ i , j) entry in reward realization indicates the qual¬ 
ity of i th agent when j th unit is procured from him and not the 
j th unit procured by the requester. 

Definition 7 (Stochastic BIC Mechanism) We say that a 
mechanism M. = (x,t) is Stochastic BIC if truth telling by any 
agent i results in highest expected utility when expectation is 
taken over reward realizations and type profiles of other agents. 
Formally , Vc^ £ [c^d],^ £ [fe i5 fe»], 

IfJi (Ci, k%, Ci, ki", s)] ^ lEs [t/j(Ci, ki, Ci, ki, §)]■ 

5.1 Sufficiency Conditions for Stochastic BIC 

We now provide sufficiency conditions for a mechanism to be 
stochastic BIC and IR. We begin by stating the modified char¬ 
acterization theorem for the learning setting. 
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Theorem 4 Any mechanism that satisfies the following condi¬ 
tions \/i £ IV, Ms £ [0, l] nxL , is stochastic BIC and IR. 

1. Xi(ci,ki', s) is non-increasing in Ci, Vs and\/ki £ 

2. pi(&i , s) non-negative, and non-decreasing in ki Vs andMc-i 

£ ia.cj. 

5. pi(ci, ki ; s) = s) + ^( 2 , k; s)dz 

The proof of the above theorem is similar to that of Theo¬ 
rem 1. Instead of fixing a quality, we are now fixing a reward 
realization. The mechanism also remains stochastic BIC and 
IR when it satisfies Theorem 4 and expectation is taken over 
reward realization. 

We now discuss a set of natural properties which a mechanism 
in this space ideally have. It also turns out that these properties 
are sufficient to ensure BIC and IR. 

Definition 8 (Well-Behaved Allocation Rule) An alloca¬ 
tion rule x is called a Well-Behaved Allocation if: 

1. Allocation to any agent i for the unit being allocated in round 
j , x\, for any reward realization s depends only on the agent’s 
bids and the reward realization of j units that are procured 
by the auctioneer so far and is non decreasing in terms of 
costs. 

2. For the unit being allocated in round j and for any three dis¬ 
tinct agents {a,/ 3 , 7 } such that j th round unit is allocated to 
p. A change of bid by agent a should not transfer allocation 
of j th round unit from p to 7 if other quantities are fixed till 
j units. 

3. For all reward realizations s, xpCi,ki\s ) is non-decreasing 
with increase in capacity ki 

As mentioned earlier, these properties are natural. Property 
1 states that the allocation should not depend on any future 
success realizations which are not observed. Property 2 is simi¬ 
lar to Independent of Irrelevant Alternatives (HA) property in 
the mechanism design theory i.e. if an agent i changes his bid 
then it should not affect the allocations of other agents. Prop¬ 
erty 3 states the allocation rule doesn’t penalize an agent with 
higher capacity, when other parameters are identical. 

Lemma 5 If an allocation rule x is well-behaved then, Vs, and 
\/ki £ [fcjjfcj], Xi{ci,ki\ s) is non-increasing in Ci. 

Proof: By slight abuse of notation, let Xi(ci,t) denote the num¬ 
ber of items procured by an agent i with bid c* until j items 
are procured. We need to prove that, 

Xi{Ci,j) < Xi(c~,j) Mc~ < Ci 


We will prove this by induction. At j = 1, the condition trivially 
holds by the monotonicity property of well-behaved allocation 
rule. Thus, by induction hypothesis, xpCi,j) < xpc~ ,j) and 
we need to prove that xpCi, j + 1) < xpcf ,j + 1). Without loss 
of generality, we will consider, xpCi,j ) = xpc ~, j), otherwise 
the condition is trivially satisfied. 

In this case, we will show that x m (ci,j ) = x m (c~,j) Mm. 
Note that x m depends on bids of all the agents. Since the cost 
of other agents and capacities of all the agents are held fixed, 
we have dropped these dependence for notational convenience. 
Let x»(ci, j) denote the number of units that are not procured 
by an agent i until j units, i.e. a;*(cj,j) = j — Xi{ci,j ), we will 
prove that for any two units j,f: 

= x,{c~,j') =7- Xm(ci,j) = Xm{c~,j') Vm A i 

We prove the above statement using induction again. If 2 ;* (c,, j) 
= x*(ci,j') = 0 , that means all the items are procured by 
the agent i, the statement is clearly true. Thus, by induc¬ 
tion hypothesis, x*{ci,j) = x*{ci,j') = £*, then x m {ci,j) = 
Xm[c ~,/) Vm ± i. Now, suppose x*(c*, j) = x*(cf,j') = x* + l. 
Again by induction hypothesis, there exist latest rounds, j\ < j 
and j[ < f such that Vm' 7 ^ i 

X»{ci,j 1 ) = = X, X m '(Ci,j 1 ) = Xm'(c~ , j[) 

Since j\ and j\ are the latest such rounds, units from j\ + 2 
to j and ;j[ + 2 to f are procured only by agent i, thus we 
need to prove that allocation at round j\ + 1 and j[ + 1 is same 
with bid Ci and c~ respectively. Since agent i is not allocated at 
these rounds, by property 2 of well-behaved allocation rule, the 
condition is satisfied. Thus, we have Xi(ci,j) = xpcf ,j) ==> 
X*{Ci,j) = x*(cf,j) => X m (Ci,j) = x m (cf,j) Mm 

Since the reward realization is fixed, if number of alloca¬ 
tions to all the agents is same till j th unit procured then 
by property 1 of well-behaved allocation rule, we have 
XpCi,j + 1) < Xi(cf, j + 1). ■ 

The following theorem guarantees a transformation of any 
well-behaved allocation rule into a stochastic BIC and IR mech¬ 
anism. 

Theorem 6 For a well-behaved allocation rule, there exists a 
transformation that produces the transformed allocation (x) and 
payment (t) such that the resulting mechanism A4 = (x,t) is 
stochastic BIC and IR. 


If we implement the following payment rule then we will get 
stochastic BIC by Theorem 4: 

Ti (a - ki , s) — CiXi (ci, ki, s) T J* Xi (z, ki, s)dz . (12) 

The challenge here is to compute the integral as the allocation 
is not known for bid profiles other then c. The allocation therein 
depends on how the qualities are learnt. In order to compute 
this integral, we adopt a sampling procedure and transforma¬ 
tion that uses Lemma 7 similar to [ 6 ]. 


3. P [oti{ci) > di\(3i(ci) = c'] = P[cq(c') > Oj] Vcq >c' i > h- 

4. Function F(a u Ci) = P [fipcf) < &(£*) > c*] = 

Proof: Properties 1, 2 are immediate from the algorithm. If 
fificf) = > Ci, it means the algorithm has followed algo¬ 

rithm 2 of algorithm 2 and thus property 3 follows. Property 
4 follows from the fact that distribution of fificf) is uniform in 
the interval [c t . cfi conditional on the event ftfici ) > c, ■ 

The algorithm that outputs the transformed allocation and the 
payment is described in Algorithm 3. 


Lemma 7 Let J- : I —> [0,1] be any strictly increasing func¬ 
tion that is differentiable and satisfies inf z ^iF(z) = 0 and 
sup z& iJ-(z) = 1. If Y is a random variable with cumulative 
distribution function T, then 



(13) 


Our self-resampling procedure is given in Algorithm 2 that 
returns vectors a, ft based on input bids. These vectors are then 
used to compute the allocation and payment. 


ALGORITHM 3: Mechanism Transformation _ 

Input: Vi, bids c; G [c^c;], h G [k t , ki], parameter p G ( 0 , 1 ), 
allocation rule x 

Output: Allocation rule x and the payment rule t 

1 Obtain modified bids as 

(a, P) = ((ai(ci), / 3 i(ci), (a 2 (c 2 ), #2(62)), ■.., ( a n (c n ), / 3 n.(c n )) 

2 Allocate according to x(c, k) = x(a(c), k ) 

3 Make payment to each agent i, t;(c, k) = Cixfic, k) + Pi, where, 


Pi = 


1 Xi(a(c) ,k) 

0 , otherwise. 


\ij3fici) > Ci 


ALGORITHM 2: Self-resampling Procedure 
Input: bid Ci G [c^Cj], parameter y G (0,1) 
Output: ( ai,pi ) such that Ci > Qi > P > Ci 

1 with probability (1 — fi) 

2 Oii i — Ci, f3i 4 — Ci 

3 with probability g 

4 Pick c'i G [ci, Ci] uniformly at random. 

5 cti G- recursive(c'i), /3i ■<— c' 

6 function Recursive(ci) 

7 with probability (1 — fi) 

8 return Ci 

9 with probability g 

10 Pick c'i G [ci,Ci] uniformly at random. 

11 return Recursive(ci) 


In order to compute the integral, we need certain properties 
to be satisfied that are described in Lemma 8 . 

Lemma 8 The procedure in Algorithm 3 satisfies the following 
properties \/i G N: 


Proof of Theorem 6: We will prove that the transformed mech¬ 
anism in Algorithm 3 satisfies all the properties in Theorem 
4 when the input allocation rule is well-behaved and thus 
is stochastic BIC and IR. Transformed allocation and pay¬ 
ment rule are denoted by x and t respectively. We denote 
Xi(ci, ki\s) = ¥.b_ ita [xi(a(c), k; s)] as the expected allocation 
with the expectation taken over randomization of the algorithm 
and bid profile of other agents. Similarly, we denote Tfic-i, fc,; s ) 
= ^b-i,a,/3[ti{ce{c), fi, k’, s)]. For all reward realizations s, we 
will prove two properties: (1) Allocation rule X is monotone 
in terms of costs, and (2) the expected payment rule T satis¬ 
fies eq. ( 12 ). 

The monotonicity of allocation rule X follows from the mono¬ 
tonicity of x (Lemma 5) and the monotonicity property 1 of 
Algorithm 2 (Property 1, Lemma 8 ). 

We now prove that ^b_i,a,p[Pi] = f~' Xfiki, 2 ; s)dz, where 
the expectation is taken over bids of other players as well as 
over the randomization of the Algorithm 3. 


1. afici) and fificf) are non-decreasing functions of Ci 

2. (A) With probability (1 — p), aficf) = fificf) = p. 
(B) With probability p, c z > aficf) > fificf) > Cj 


E b _ t , a<p [Pi\ 

= \p t [Pi] 

= P(ft > C i )E / 3.| / 3. >a .E 6 _. ! c t | / 3 .[Pi] 
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(Pi does not depend on f3-i) 
(Pi = 0 if fii = of) 













1 


Xi(a(c), k- s) 


(Property 2 of Lemma 8) 


= E 


Pi\Pi>Ci -pi 




E b _ itC ,[xi(ai(Pi),a-i(c-i), fc; s)] 

(Property 3 of Lemma 8) 


Xi(/3i,ki-,s) 


i: 


Xi(z, ki■ s)dz 
We also have, 

pi(ci, ki ; s) = TiiZi, ki ; s) — CiW(ci, ki ; s) 


(Lemma 7) 


(eq. (2)) 


r 


= aXi(ci,c-i,k;s) - I Xi(z,ki;s)dz - aXi(a,ki;s) 
= 0 


Thus, Pi(ci, s) = pi(ci,ki-,s) + Xi(z, kp s)dz. Since the 
allocation rule is monotone in capacity, Pi(h t \ s) non-negative, 
and non-decreasing in ki, Vs and Vc, £ [c,, <y]. ■ 


5.2 2D-UCB: A Learning Mechanism 

With the necessary machinery established, we now present the 
learning mechanism given in Algorithm 4. Mechanism 2D-UCB 
procures one unit at a time, learns the quality and makes the 
allocation similar to 2D-OPT on the basis of learnt qualities 
so far. The payment is computed with the help of transformed 
mechanism given in Algorithm 3. 

Theorem 9 2D-UCB is stochastic BIC and IR. 

Proof: We first prove that the allocation rule produced 

by 2D-UCB mechanism is well-behaved. At every time, the 
mechanism allocates the unit to an agent with highest value 
of Gi . The value of Gi only depends on learnt quality so far. 
It is monotone in terms of cost due to regularity assumption 
and monotonicity property of Algorithm 2. Thus Property 1 
of well-behaved is satisfied. If an agent reduces his capacity 
then he might lose an allocation since no agent is allocated 
more then his bid capacity thus satisfying property 3. The 
allocation rule also satisfy property 2 (IIA) since the allocation 
is made to the agent with highest Gi and if agent i changes his 
bid then it will not affect the G[s of other agents. Since the 
payment structure follows from algorithm 3, and conditions of 
Theorem 4 are also satisfied and thus the resulting mechanism 
is stochastic BIC and IR. ■ 


ALGORITHM 4: 2D-UCB Mechanism 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Input: Vi £ N, bids Ci £ [c^Ci], ki £ parameter 

p £ (0,1), Reward parameter R 
Output: A mechanism A4 = ( x,t ) 

Vi £ N, qf = 1, q~ — 0, m — 1 
Obtain modified bids as (a, /3) 

= ((ai(ci),/3i(ci),..., (a„(c ?1 ),/3n(c„)) using algorithm 2 
Allocate one unit to all agents and estimate empirical quality q 

Qi = qi{i)/ni, qf = <?; + yjln(t ) 
for t = n to L do 

Compute H, = on + 

Let i = argmax {js±fc . >n . } Rq+ - H :i and Gi = Rqf - Hi 

if Gj > 0 then 

Procure the unit from agent i and update qi 
qf = qi + iji. ln(t) 


12 else 

13 break \\ Don’t allocate future units to anyone 


14 Make payment to each agent i, Ti = CiUi + Pi, where, 

15 


Pi = 


jlni{a - Ci), if/3 i > Ci 

0 , otherwise. 


6 Simulations 

In Section 5, we have presented a learning mechanism 2D-UCB, 
which embeds 2D-OPT. We have theoretically established the 
optimality of 2D-OPT when the qualities of the agents are 
known. A detailed regret analysis of our learning mechanism 
2D-UCB will be quite involved and forms an interesting future 
direction. We instead evaluate the performance of our learning 
mechanism via simulations. 

In the simulations, we compare the expected utility per unit 
given by 2D-UCB against the optimal benchmark 2D-OPT 
which is fully aware of underlying quality. Another good bench¬ 
mark to compare against is an e— separated mechanism. An 
e—separated mechanism allocates eL units to all the agents ir¬ 
respective of their bids. Based on the observed realization, the 
learned qualities in these rounds are used to find the allocation 
and payments in (1 —e)L future rounds using 2D-OPT and also 
qualities are not updated further. It is easy to verify that an 
e— separated mechanism is BIC and IR. 
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For the simulations, the number of units of the item (L), 
which the auctioneer wishes to procure, is chosen at first as 
10 3 and subsequently at nine other linearly spaced steps from 
10 3 to 10 5 . We choose a pool of five agents(AT). A unit pro¬ 
cured from an agent i yields a Bernoulli reward with mean q l 
drawn uniformly from the interval [0.5,1]. The private types 
of the agents are independently distributed and the costs are 
drawn uniformly from [0,1]. The cost and capacity are cho¬ 
sen to be independently distributed and therefore the setup 
meets regularity. The capacity is a positive integer drawn with 
equal probability in the range with upper limit as L and lower 
limit large enough to meet the uniform exploration. For this 
type distribution, it can be shown that the virtual cost func¬ 
tion for an agent i is Hi = 2 ct by simple computation. For the 
e-separated mechanisms, we choose the number of exploration 
rounds as {L 1 / 6 , L 1 / 3 , L 1 / 2 , L 2 / 3 }. A Bernoulli reward 1 of a 
procured instance yields a reward of R = 30 to auctioneer. The 
performance measure used is the expected average utility per 
unit obtained by the auctioneer plotted as a function of the 
number of units. To estimate the expected average utility, 200 
independent samples are drawn from the type distribution; for 
each such sample the number of units required to be procured 
is varied; at each value of L multiple instances(lOO) of reward 
realization is drawn from the true underlying quality. As L is 
varied, the capacity is suitably scaled yielding a constant av¬ 
erage utility for the benchmark as shown in fig. 2. We choose 
H = 0.1 for 2D-UCB. 


7 Conclusion 

We have studied a class of mechanisms which yield a stochastic 
reward to the auctioneer following an allocation to an agent. 
We have presented optimal learning mechanisms which truth¬ 
fully elicit multiple private types. A corresponding welfare max¬ 
imizing version follows directly from the ideas presented in this 
paper. It would be interesting to study a setting where the al¬ 
location is over a subset of agents rather than a single agent. A 
complete characterization of a learning algorithm in this space 
is still open as we have provided only sufficient conditions. Also, 
a theoretic lower bound on regret would be interesting. 



Figure 2: Comparative study of average utility per unit 


The simulations indicate that all the mechanisms yield aver¬ 
age utilities per unit which asymptotically converge to 2D-OPT. 
The performance of 2D-UCB however is superior in the sense 
that it approaches 2D-OPT faster. 
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